Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.

نویسندگان

  • Elliott H Margulies
  • Gregory M Cooper
  • George Asimenos
  • Daryl J Thomas
  • Colin N Dewey
  • Adam Siepel
  • Ewan Birney
  • Damian Keefe
  • Ariel S Schwartz
  • Minmei Hou
  • James Taylor
  • Sergey Nikolaev
  • Juan I Montoya-Burgos
  • Ari Löytynoja
  • Simon Whelan
  • Fabio Pardi
  • Tim Massingham
  • James B Brown
  • Peter Bickel
  • Ian Holmes
  • James C Mullikin
  • Abel Ureta-Vidal
  • Benedict Paten
  • Eric A Stone
  • Kate R Rosenbloom
  • W James Kent
  • Gerard G Bouffard
  • Xiaobin Guan
  • Nancy F Hansen
  • Jacquelyn R Idol
  • Valerie V B Maduro
  • Baishali Maskeri
  • Jennifer C McDowell
  • Morgan Park
  • Pamela J Thomas
  • Alice C Young
  • Robert W Blakesley
  • Donna M Muzny
  • Erica Sodergren
  • David A Wheeler
  • Kim C Worley
  • Huaiyang Jiang
  • George M Weinstock
  • Richard A Gibbs
  • Tina Graves
  • Robert Fulton
  • Elaine R Mardis
  • Richard K Wilson
  • Michele Clamp
  • James Cuff
  • Sante Gnerre
  • David B Jaffe
  • Jean L Chang
  • Kerstin Lindblad-Toh
  • Eric S Lander
  • Angie Hinrichs
  • Heather Trumbower
  • Hiram Clawson
  • Ann Zweig
  • Robert M Kuhn
  • Galt Barber
  • Rachel Harte
  • Donna Karolchik
  • Matthew A Field
  • Richard A Moore
  • Carrie A Matthewson
  • Jacqueline E Schein
  • Marco A Marra
  • Stylianos E Antonarakis
  • Serafim Batzoglou
  • Nick Goldman
  • Ross Hardison
  • David Haussler
  • Webb Miller
  • Lior Pachter
  • Eric D Green
  • Arend Sidow
چکیده

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes.

Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individual's functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contri...

متن کامل

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

A network of conserved co-occurring motifs for the regulation of alternative splicing

Cis-acting short sequence motifs play important roles in alternative splicing. It is now possible to identify such sequence motifs as conserved sequence patterns in genome sequence alignments. Here, we report the systematic search for motifs in the neighboring introns of alternatively spliced exons by using comparative analysis of mammalian genome alignments. We identified 11 conserved sequence...

متن کامل

Molecular analysis of AbOmpA type-1 as immunogenic target for therapeutic interventions against MDR Acinetobacter baumannii infection

Introduction: Acinetobacter baumannii is associated with hospital-acquired infections. Outer membrane protein A of A.baumannii (AbOmpA) is a well-characterized virulence factor which has important roles in pathogenesis of this bacterium. Methods: Based on our PCR-sequencing of ompA gene in the clinical isolates, AbOmpA protein can be categorized into two types, named here type-1 and type-2. We ...

متن کامل

What fraction of the human genome is functional?

Many evolutionary studies over the past decade have estimated α(sel), the proportion of all nucleotides in the human genome that are subject to purifying selection because of their biological function. Most of these studies have estimated the nucleotide substitution rates from genome sequence alignments across many diverse mammals. Some α(sel) estimates will be affected by the heterogeneity of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 17 6  شماره 

صفحات  -

تاریخ انتشار 2007